ABSTRACT
The global impact of the COVID-19 pandemic underlines the importance of developing a competent machine learning (ML) approach that can rapidly design therapeutics and prophylactics such as antibodies/nanobodies against novel viral infections despite data shortage problems and sequence complexity. Here, we propose a novel end-to-end deep generative model based on convolutional Variational Autoencoder (VAE), Residual Neural Network (Resnet), and Transfer Learning (TL), named VAEResTL that can competently generate CDR-H3 sequences for a novel target lacking sufficient training data. We further demonstrate that our proposed method generates the third complementarity-determining region (CDR) of the heavy chain (CDR-H3) sequences for designing and developing therapeutic antibodies/nanobodies that can bind to different variants of SARS-CoV-2 despite the shortage of SARS-CoV-2 training data. The predicted CDR-H3 sequences are then screened and filtered for their developability parameters namely viscosity, clearance, solubility, stability, and immunogenicity through several in-silico steps resulting in a list of highly optimized lead candidates.